109 research outputs found

    Pre-processing of tandem mass spectra using machine learning methods

    Get PDF
    Protein identification has been more helpful than before in the diagnosis and treatment of many diseases, such as cancer, heart disease and HIV. Tandem mass spectrometry is a powerful tool for protein identification. In a typical experiment, proteins are broken into small amino acid oligomers called peptides. By determining the amino acid sequence of several peptides of a protein, its whole amino acid sequence can be inferred. Therefore, peptide identification is the first step and a central issue for protein identification. Tandem mass spectrometers can produce a large number of tandem mass spectra which are used for peptide identification. Two issues should be addressed to improve the performance of current peptide identification algorithms. Firstly, nearly all spectra are noise-contaminated. As a result, the accuracy of peptide identification algorithms may suffer from the noise in spectra. Secondly, the majority of spectra are not identifiable because they are of too poor quality. Therefore, much time is wasted attempting to identify these unidentifiable spectra. The goal of this research is to design spectrum pre-processing algorithms to both speedup and improve the reliability of peptide identification from tandem mass spectra. Firstly, as a tandem mass spectrum is a one dimensional signal consisting of dozens to hundreds of peaks, and majority of peaks are noisy peaks, a spectrum denoising algorithm is proposed to remove most noisy peaks of spectra. Experimental results show that our denoising algorithm can remove about 69% of peaks which are potential noisy peaks among a spectrum. At the same time, the number of spectra that can be identified by Mascot algorithm increases by 31% and 14% for two tandem mass spectrum datasets. Next, a two-stage recursive feature elimination based on support vector machines (SVM-RFE) and a sparse logistic regression method are proposed to select the most relevant features to describe the quality of tandem mass spectra. Our methods can effectively select the most relevant features in terms of performance of classifiers trained with the different number of features. Thirdly, both supervised and unsupervised machine learning methods are used for the quality assessment of tandem mass spectra. A supervised classifier, (a support vector machine) can be trained to remove more than 90% of poor quality spectra without removing more than 10% of high quality spectra. Clustering methods such as model-based clustering are also used for quality assessment to cancel the need for a labeled training dataset and show promising results

    Erythrocyte transfusion limits the role of elevated red cell distribution width on predicting cardiac surgery associated acute kidney injury

    Get PDF
    Background: Acute kidney injury (AKI) is one of the more serious complications after cardiac surgery. Elevated red cell distribution width (RDW) was reported as a predictor for cardiac surgery associated acute kidney injury (CSAKI). However, the increment of RDW by erythrocyte transfusion makes its prognostic role doubtful. The aim of this study is to elucidate the impact of erythrocyte transfusion on the prognostic role of elevated RDW for predicting CSAKI.Methods: A total of 3207 eligible patients who underwent cardiac surgery during 2016–2017 were enrolled. Changes of RDW was defined as the difference between preoperative RDW and RDW measured 24 h after cardiac surgery. The primary outcome was CSAKI which was defined by the Kidney Disease: Improving Global Outcomes Definition and Staging (KDIGO) criteria. Univariate and multivariate analysis were performed to identify predictors for CSAKI.Results: The incidence of CSAKI was 38.07% and the mortality was 1.18%. CSAKI patients had higher elevated RDW than those without CSAKI (0.65% vs. 0.39%, p < 0.001). Multivariate regression showed that male, age, New York Heat Association classification 3–4, elevated RDW, estimated glomerular filtration rate < 60 mL/min/1.73 m2, cardiopulmonary bypass time > 120 min and erythrocyte transfusion were associated with CSAKI. Subgroup analysis showed elevated RDW was an independent predictor for CSAKI in the non-transfused subset (adjusted odds ratio: 1.616, p < 0.001) whereas no significant association between elevated RDW and CSAKI was found in the transfused patients (odds ratio: 1.040, p = 0.497).Conclusions: Elevated RDW is one of the independent predictors of CSAKI in the absence of erythrocyte transfusion, which limits the prognostic role of the former on predicting CSAKI

    Tetraploidy in Citrus wilsonii Enhances Drought Tolerance via Synergistic Regulation of Photosynthesis, Phosphorylation, and Hormonal Changes

    Get PDF
    Polyploidy varieties have been reported to exhibit higher stress tolerance relative to their diploid relatives, however, the underlying molecular and physiological mechanisms remain poorly understood. In this study, a batch of autotetraploid Citrus wilsonii were identified from a natural seedling population, and these tetraploid seedlings exhibited greater tolerance to drought stress than their diploids siblings. A global transcriptome analysis revealed that a large number of genes involved in photosynthesis response were enriched in tetraploids under drought stress, which was consistent with the changes in photosynthetic indices including Pn, gs, Tr, Ci, and chlorophyll contents. Compared with diploids, phosphorylation was also modified in the tetraploids after drought stress, as detected through tandem mass tag (TMT)-labeled proteomics. Additionally, tetraploids prioritized the regulation of plant hormone signal transduction at the transcriptional level after drought stress, which was also demonstrated by increased levels of IAA, ABA, and SA and reduced levels of GA3 and JA. Collectively, our results confirmed that the synergistic regulation of photosynthesis response, phosphorylation modification and plant hormone signaling resulted in drought tolerance of autotetraploid C. wilsonii germplasm

    Alcohol consumption, DNA methylation and colorectal cancer risk:Results from pooled cohort studies and Mendelian randomization analysis

    Get PDF
    Alcohol consumption is thought to be one of the modifiable risk factors for colorectal cancer (CRC). However, the causality and mechanisms by which alcohol exerts its carcinogenic effect are unclear. We evaluated the association between alcohol consumption and CRC risk by analyzing data from 32 cohort studies and conducted two-sample Mendelian randomization (MR) analysis to examine for casual relationship. To explore the effect of alcohol related DNA methylation on CRC risk, we performed an epigenetic MR analysis with data from an epigenome-wide association study (EWAS). We additionally performed gene-alcohol interaction analysis nested in the UK Biobank to assess effect modification between alcohol consumption and susceptibility genes. We discovered distinct effects of alcohol on CRC incidence and mortality from the meta-analyses, and genetic predisposition to alcohol drinking was causally associated with an increased CRC risk (OR = 1.79, 95% CI: 1.23-2.61) using two-sample MR approaches. In epigenetic MR analysis, two alcohol-related CpG sites (cg05593667 and cg10045354 mapped to COLCA1/COLCA2 gene) were identified causally associated with an increased CRC risk (P < 8.20 × 10-4 ). Gene-alcohol interaction analysis revealed that carriage of the risk allele of the eQTL (rs3087967) and mQTL (rs11213823) polymorphism of COLCA1/COLCA2 would interact with alcohol consumption to increase CRC risk (PInteraction  = .027 and PInteraction  = .016). Our study provides comprehensive evidence to elucidate the role of alcohol in CRC and highlights that the pathogenic effect of alcohol on CRC could be partly attributed to DNA methylation by regulating the expression of COLCA1/COLCA2 gene

    Feature-based classifiers for somatic mutation detection in tumour–normal paired sequencing data

    Get PDF
    Motivation: The study of cancer genomes now routinely involves using next-generation sequencing technology (NGS) to profile tumours for single nucleotide variant (SNV) somatic mutations. However, surprisingly few published bioinformatics methods exist for the specific purpose of identifying somatic mutations from NGS data and existing tools are often inaccurate, yielding intolerably high false prediction rates. As such, the computational problem of accurately inferring somatic mutations from paired tumour/normal NGS data remains an unsolved challenge

    Single cell transcriptome analysis reveals disease-defining T cell subsets in the tumor microenvironment of classic Hodgkin lymphoma

    Get PDF
    Hodgkin lymphoma is characterized by an extensively dominant tumor microenvironment (TME) composed of different types of noncancerous immune cells with rare malignant cells. Characterization of the cellular components and their spatial relationship is crucial to understanding cross-talk and therapeutic targeting in the TME. We performed single-cell RNA sequencing of more than 127,000 cells from 22 Hodgkin lymphoma tissue specimens and 5 reactive lymph nodes, profiling for the first time the phenotype of the Hodgkin lymphoma–specific immune microenvironment at single-cell resolution. Single-cell expression profiling identified a novel Hodgkin lymphoma–associated subset of T cells with prominent expression of the inhibitory receptor LAG3, and functional analyses established this LAG3+ T-cell population as a mediator of immunosuppression. Multiplexed spatial assessment of immune cells in the microenvironment also revealed increased LAG3+ T cells in the direct vicinity of MHC class II–deficient tumor cells. Our findings provide novel insights into TME biology and suggest new approaches to immune-checkpoint targeting in Hodgkin lymphoma. SIGNIFICANCE: We provide detailed functional and spatial characteristics of immune cells in classic Hodgkin lymphoma at single-cell resolution. Specifically, we identified a regulatory T-cell–like immunosuppressive subset of LAG3+ T cells contributing to the immune-escape phenotype. Our insights aid in the development of novel biomarkers and combination treatment strategies targeting immune checkpoints

    Computational methods for systems biology data of cancer

    No full text
    High-throughput genome sequencing and other techniques provide a cost-effective way to study cancer biology and seek precision treatment options. In this dissertation I address three challenges in cancer systems biology research: 1) predicting somatic mutations, 2) interpreting mutation functions, and 3) stratifying patients into biologically meaningful groups. Somatic single nucleotide variants are frequent therapeutically actionable mutations in cancer, e.g., the ‘hotspot’ mutations in known cancer driver genes such as EGFR, KRAS, and BRAF. However, only a small proportion of cancer patients harbour these known driver mutations. Therefore, there is a great need to systematically profile a cancer genome to identify all the somatic single nucleotide variants. I develop methods to discover these somatic mutations from cancer genomic sequencing data, taking into account the noise in high-throughput sequencing data and valuable validated genuine somatic mutations and non-somatic mutations. Of the somatic alterations acquired for each cancer patient, only a few mutations ‘drive’ the initialization and progression of cancer. To better understand the evolution of cancer, as well as to apply precision treatments, we need to assess the functions of these mutations to pinpoint the driver mutations. I address this challenge by predicting the mutations correlated with gene expression dysregulation. The method is based on hierarchical Bayes modelling of the influence of mutations on gene expression, and can predict the mutations that impact gene expression in individual patients. Although probably no two cancer genomes share exactly the same set of somatic mutations because of the stochastic nature of acquired mutations across the three billion base pairs, some cancer patients share common driver mutations or disrupted pathways. These patients may have similar prognoses and potentially benefit from the same kind of treatment options. I develop an efficient clustering algorithm to cluster high-throughput and high-dimensional bio- logical datasets, with the potential to put cancer patients into biologically meaningful groups for treatment selection.Science, Faculty ofComputer Science, Department ofGraduat

    Signal Detection Methods Based on Less Matrix Inversion for Massive MIMO Systems

    No full text
    In this paper, we will represent several methods that can reduce the computational complexity to detect signals for Uplink Massive MIMO Systems. Then we will show the simulation performance of these methods and analyse them. Finally we will give improvement for better performance
    corecore